Bayesian variable selection for finite mixture model of linear regressions

نویسندگان

  • Kuo-Jung Lee
  • Ray-Bing Chen
  • Ying Nian Wu
چکیده

We propose a Bayesian method for variable selection in the finite mixture model of linear regressions. The model assumes that the observations come from a heterogeneous population which is a mixture of a finite number of sub-populations. Within each sub-population, the response variable can be explained by a linear regression on the predictor variables. So the whole data set can be modeled by a mixture of linear regressions, where each mixture component follows a separate regression model. In the case where the number of predictor variables is large, it is assumed that only a small subset of variables are important for explaining the response variable. It is further assumed that for different mixture components, different subsets of variables may be needed to explain the response variable. This gives rise to a complex variable selection problem. We propose to solve this problem within the Bayesian framework where we introduce two sets of latent variables. For the first set of latent variables, each observation is associated with an indicator, indicating which sub-population or mixture component this observation comes from. For the second set of latent variables, within each mixture component, each predicator variable is associated with an indicator, indicating whether this variable is included in the regression model of the mixture component. Variable selection can then be accomplished by sampling from the posterior distribution of the indicators as well as the coefficients of the selected variables. We conduct simulation studies to demonstrate that the proposed method performs well in comparison with existing methods. We also analyze two real data sets to further illustrate the proposed method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Negative Binomial Distribution Efficiency in Finite Mixture of Semi-parametric Generalized Linear Models

Introduction Selection the appropriate statistical model for the response variable is one of the most important problem in the finite mixture of generalized linear models. One of the distributions which it has a problem in a finite mixture of semi-parametric generalized statistical models, is the Poisson distribution. In this paper, to overcome over dispersion and computational burden, finite ...

متن کامل

An Overview of the New Feature Selection Methods in Finite Mixture of Regression Models

Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...

متن کامل

Model Selection for Mixture Models Using Perfect Sample

We have considered a perfect sample method for model selection of finite mixture models with either known (fixed) or unknown number of components which can be applied in the most general setting with assumptions on the relation between the rival models and the true distribution. It is, both, one or neither to be well-specified or mis-specified, they may be nested or non-nested. We consider mixt...

متن کامل

QSAR studies and application of genetic algorithm - multiple linear regressions in prediction of novel p2x7 receptor antagonists’ activity

Quantitative structure-activity relationship (QSAR) models were employed for prediction the activity of P2X7 receptor antagonists. A data set consisted of 50 purine derivatives was utilized in the model construction where 40 and 10 of these compounds were in the training and test sets respectively. A suitable group of calculated molecular descriptors was selected by employing stepwise multiple ...

متن کامل

Bayesian Mixture of Probabilistic Linear Regressions for Voice Conversion

The objective of voice conversion is to transform the voice of one speaker to make it sound like another. The GMM-based statistical mapping technique has been proved to be an efficient method for converting voices [1, 2]. In a recent work [3], we generalized this technique to Mixture of Probabilistic Linear Regressions (MPLR) by using general mixture model of source vectors. In this paper, we i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 95  شماره 

صفحات  -

تاریخ انتشار 2016